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1  INTRODUCTION 

-V  Systolic  arrays  are  highly  parallel  computing 
structures  specific  to  particular  computing  tasks. 
They  are  well-suited  for  reliable  and  inexpensive 
implementation  using  many  identical  VLSI  compon¬ 
ents.  The  designs  consist  of  one  and  two-dimen¬ 
sional  lattices  of  identical  processing  elements. 
Communication  of  data  occurs  only  between  neigh¬ 
boring  cells.  Control  signals  propagate  through 
the  array  like  data.  These  characteristics  make 
it  feasible  to  construct  very  large  arrays. 

Several  modern  methods  in  digital  signal  pro¬ 
cessing  require  real  time  solution  of  some  of  the 
basic  problems  of  linear  algebra  -f+Sfb*  Fortunately 
systolic  arrays  have  heen^develoned  for  many  of 
these  problems i (4 , 10, 12] .  But  several  gaps  remain. 
Only  partially  satisfying  results  have  been  obtain¬ 
ed  for  the  eigenvalue  and  singular  value  decomposi¬ 
tions,  for  example. 

77,1  -7 

-kere-mr  consider^  systolic  array  for  the  sing¬ 
ular  value  decomposition  (SVD).  An  SVD  of  an  m  x  n 
Im  n)  matrix  A  is  a  factorization  \ 

A  -  U  :  VT  _ 

where  U  is  m  x  n  with  orthonormal  columns,  I  * 

diagl*,,  with  ** 

i  ;  i  1  —  -  —  —  n 

and  V  is  orthogonal.  There  are  many  important 
applications  of  the  SVD  (1,6,13). 

There  have  been  several  earlier  investigations 
of  parallel  SVD  algorithms  and  arrays.  First, 

Finn.  Luk,  and  Pottle  describe  a  systolic  structure 
of  n~/2  processors  and  two  algorithms  that  use  it. 
But  the  convergence  of  their  algorithms  has  not 
been  proved  and  may  be  slow  [3).  Heller  and  Ipsen 
(8)  describe  an  array  for  computing  the  singular 
values  of  a  banded  matrix  with  bandwidth  w.  They 
use  0(w)  processors  and  0(wn*“)  time.  Brent  and 
Luk  (2)  describe  an  n/2  processor  linear  array 
that  implements  a  one-sided  orthogonalization 
method  and  converges  reliably  in  0(n  log  n)  time. 
Unfortunately  the  processors  in  this  array  are 
quite  complex,  and  it  is  not  clear  that  matrices 
with  more  than  n  columns  can  be  efficiently 
accomodated .  ^ 

fkt  JL,  iftrtJ 

~y  In  this  paper  we  discus?  two  top.cs.  First, 

-we  show^hov  an  architecture  for  computing  the 
eigenvalues  of  a  symmetric  matrix  can  be  modified 
to  compute  singular  values  and  vectors.  Second, 


-we- discussr^the  implementation  using  VLSI  chips  of 
these  systolic  eigenvalue  and  SVD  arrays. 

The  SVD  is  often  used  to  regularize  ill-condi¬ 
tioned  problems.  In  these  there  are  p  <  n  large 
singular  values  and  n-p  chat  are  much  smaller. 

What  is  needed  is  the  pseudoinverse  of  the  rank  p 
matrix  closest  (with  respect  to  the  2-norm)  to  A, 


(P) 


J1  °1  V1  + 


P  P 


We  have  recently  developed  a  new  algorithm  to  com- 

k~,  .  that  involves  nothing  but  a  sequence  of  mat- 

\P) 

rix-matrix  products,  for  which  systolic  arrays  are 
well-known  fsee.  e.e..  f0!.)  An  alternate  nf 

the  algorithm  can  be  used  to  compute  the  related 
orthogonal  projection  matrix 

T  T 

(p)  1  1  P  p 


2  AN  SVD  ARCHITECTURE 

Let  A  be  a  given  matrix.  The  singular  values 
of  A  will  be  obtained  in  two  phases: 

1.  A  is  reduced  to  an  upper  triangular 
matrix  B  with  bandwidth  k+1. 


0  if  i  >  j  or  i  j-k. 
and  B  *  QAP  where  Q  and  P  are  orthogonal. 


b.  . 


2.  B  is  diagonalized  by  an  iterative  process 
equivalent  to  jjnplicitly  shifted  QR 
iteration  on  B  B. 

With  k-1  this  is  the  standard  method  of  Colub 
and  Reinsch  (7).  The  reason  for  allowing  k>l  is  an 
increase  in  the  parallelism.  In  phase  1,  kn  proc¬ 
essors  are  employed;  the  time  is  0(mn/k).  In 
phase  2,  2k  processors  are  used;  the  time  per  it¬ 
eration  is  6n+0(k). 

2. I  Reduction  to  banded  form 

The  reduction  step  uses  a  k  x  n  trapezoidal 
array  that  has  been  described  in  detail  previously 
(12).  Let  the  m  x  n  matrix  X  be  partitioned  as 


X11  X12 


X21  X22 


J  r  b  :  • 


OS  16 
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where  X  .  li  k  x  k.  The  array  applies  a  sequence 
of  Clvens  rotations  to  the  rows  of  X  to  zero  the 
first  k  colusms  below  the  main  diagonal.  If  Q  is 
Che  product  of  these  rotations,  then 


where  R^  is  k  x  k  upper  triangular.  R^,  Y^, 

Y„,  and  the  parameters  of  Che  rotations  that  aake 
up~Q  all  flow  out  from  the  array.  The  tine  requir¬ 
ed  is  a.  (Here  and  below  we  give  "times"  in  units 
of  the  time  required  for  an  individual  cell  in  the 
array  to  carry  out  its  computation.) 


How  let 


be  the  given  matrix.  Send  A  through  the  array  to 
produce 


Next  send  [C^,  C^]  through  to  produce 


P1IC12*  C22^  “  lL12*  A22] 


(Although  the  input  matrix  has  m  columns,  the  array 
can  handle  chls  fnrtnrl xntifln  In  time  m  hv  sinking 

jm/nj  passes  over  the  data  [12],  Now  continue  this 
process  using  A^^  in  place  of  A.  Afrer  jsfn/kl 

such  steps  we  have  produced  a  k+1  diagonal,  upper 
triangular  matrix  B, 


*1,1  L1.2 


*2,2  L2,3 


such  that  A  -  QBP  where  Q  and  P  are  orthogonal. 
The  total  time  used  Is  mJ  *  mn/fc. 


The  transposition  of  data  required  can  be  done 
by  a  specialized  switching  device,  a  "systolic 
shifter,"  described  earlier  (12]. 


are  orthogonal. 

First  we  consider  QR  iteration  on  BTB  without 
shifts.  This  can  be  realized  by  the  procedure 

1.  Find  Q^  such  that 
L(i)  .  B(i)Q(t) 

Is  lower  triangular, 

2.  Find  P^  such  that 
B(i+1>  .  p(i)L(l) 

is  upper  triangular. 

Both  steps  of  this  procedure  can  be  carried  out  by 
the  Hcller-Ipsen  (HI)  array  (8],  This  is  a  k  x  w 
rectangular  array  for  QR  factorization  of  w-diag- 
onal  matrices.  In  this  array,  plane  rotations  are 
generated  at  the  left  edge  and  move  to  the  right, 
affecting  a  pair  of  matrix  rows.  Take  w  -  k+1. 

B^  enters  the  matrix  at  the  bottom,  each  diagonal 
entering,  one  element  at  a  time,  into  one  of  the 
processors.  The  array  annihilates  the  elements  of 
the  upper  triangle  of  B(*-) .  This  causes  fill-in 
of  k  diagonals  In  the  lower  triangle.  The  result¬ 
ing  matrix  emerges  from  the  top  in  the  same 

diagonal-per-processor  format.  It  Immediately 
enters  a  second  array.  This  array  annihilates  the 
lower  triangle  of  L(l)  and  the  resulting  upper  tri¬ 
angular  matrix  id+D  emerges  from  the  top  (Pig.  1) 

The  time  is  2n+4k  per  Iteration:  element  a  enters 

fan 

the  bottom  array  at  time  2n,  leaves  at  the  upper 
left  corner  at  time  2n+2k.  and  leaven  the  tor  arrov 

at  time  2n+4k. 

Unshlfted  QR  converges  slowljr.  The  rate  of 
convergence  of  b  to  o  Is  •  In  some  situ¬ 

ations  this  may  Be  adequate  and  hie  simplicity  of 
the  structure  used  is  then  a  real  advantage. 

It  is  also  easy  to  pipeline  the  iterations.  As 
gd+l)  comes  out  of  the  second  array  it  can  be  sent 
directly  Into  another  pair  of  arrays  to  begin  the 
(1+1) th  Iterations,  etc.  As  many  as  n/4k  Itera¬ 
tions  can  be  effectively  pipelined;  any  more  and 
the  pipe  length  exceeds  a,  so  that  the  pipe 
never  gets  full.  If  we  choose  k  •  Ofn1'2)  and 
pipeline  n/4k  *  0(n1/z)  Iterations  then  the  number 
of  processors  in  both  arrays  Is  0(n3'2)  and  the 
total  time,  assuming  0(n)  Iterations  of  QR  arc 
required,  is  also  0(n2'2).  These  considerations 
also  apply  to  the  array  implementation  of  the 
implicitly  shifted  QR  algorithm  that  Is  discussed 
below,  with  one  important  proviso.  When  pipelin¬ 
ing  the  iterations,  some  strategy  for  choosing 
several  shifts  in  advance  must  be  used. 


Uhen  singular  vectors  are  to  be  computed,  the 
rotations  generated  by  the  array  say  be  applied  to 
identity  matrices  of  order  a  and  n.  This  can  be 
done  by  the  array.  These  matrices  accumulate  the 
product  of  the  rotations  used,  that  Is  tne  ortho¬ 
gonal  matrices  Q  and  P  above. 

2.2  QR  Iteration 

Now  we  consider  QR  iteration  to  get  the  singu¬ 
lar  values  of  B,  hence  those  of, A.  He  shall  gener¬ 
ate  a  sequence  of  matrices  {B^1* )  having  the  same 
structure  as  B  and  converging  to  a  diagonal  matrix. 

B<0)  •  >  and  B<1+1)  ■  P<l)B(i)Q(i)  vhere  P(l) 


2.2.1  Implicitly  shifted  QR  iteration 

To  obtain  adequate  convergence  speed  we  need  to 
incorporate  shifts.  Following  Stewart  [14],  sup¬ 
pose  that  one  QR  iteration  with  shift  1  is  per¬ 
formed  on  BTB,  and  the  orthogonal  matrix  so  gene¬ 
rated  is  Q.  Then  proceed  as  follows: 

1.  Let  Q«  be  sny  matrix  whose  first  k  columns 
are  the  same  as  those  of  Q; 

2.  Using  the  same  technique  as  In  Section  2.1, 
reduce  BQg  to  upper  triangular  k+1  diag¬ 
onal  form;  yielding  a  matrix  B'. 


and 


It  can  b«  shown  that  B’TB'  ta  the  aatrlx  that  would 
result  fro*  one  QR  step  with  shift  \  applied  to 

btb. 


To  use  the  trapezoidal  array  as  described  above 
to  carry  out  step  2  would  be  inefficient.  Rather 
we  proceed  aa  follows.  Qz  is  coaposed  of  plane 
rotations  that  zero  the  first  k  coluans  of  (BTB-X) 
below  the  main  diagonal.  Applying  Q.  to  B  causes 
flll~ln  in  the  k  diagonals  below  the  aain  diagonal, 

confined  to  rows  2,  3 . 2k.  See  Fig.  2  for  the 

case  k-2. 


FI  cure  1 


Unshifted  QR  iteration  with  two  Keller- 
Ipsen  arrays 
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Figure  2 

Structure  of  BQq  ,  k»2 


Rotations  are  generated  at  its  right  edge  and  wove 
left,  affecting  pairs  of  aatrlx  coluans.  Let  P, 
and  Q,  be  the  orthogonal  matrices  implicitly  used 
by  the  two  HI  arrays.  The  aatrlx  eaerglng  from  the 
second  array  is 

*1  ' 

and  it  has  the  fora 


where  B.,  is  a  k  x  n  upper  triangular,  k+1  diagonal 
aatrlx,  and  B,  is  an  n-k  x  n-k  aatrlx  of  the  saae 
fora  as  BQq.  Fig.  3  illustrates  this  for  k*2. 


time 


Lac  the  first  2k  rows  of  BQq  be  sent  into  a 
k  x  2k+l  HI  array.  By  a  sequence  of  plane  rota¬ 
tions  applied  to  the  rows,  the  array  removes  the 
"bulge”  in  the  lower  triangle,  adding  a  bulge  of 
the  saae  shape  in  the  first  3k  coluans  of  the  upper 
triangle.  This  data  flows  directly  into  another 
k  x  2k+l  HI  array  that  reaoves  the  eleaents  to  the 
right  of  the  kth  superdiagonal  and  causes  a  new 
bulge  to  appear  in  tha  lower  trlangla,  in  coluans 
k+1  through  3k-l  and  extending  to  row  3k.  (The 
second  HI  array  is  the  alrror  laage  of  the  first. 


Figure  3 

"Chasing  tha  bulge”  with  two  k  x  2 k+1 
Heller-Ipsen  arrays 

Now  we  do  exactly  the  saae  thing  to  B,. ,  etc. 
This  yields  aatrlces  ‘ 

*3  "  Vj.J-1 V 


J-2 . J 


with 


and  J  -  [*(n-l)/k].  Finally  B’  -  P  ...  P.BQ.  ...Q 
Is  the  matrix  we  require. 

The  time  neede  la  6n.  It  takes  2k  steps  for  an 
HI  array  to  start  producing  output.  Thus,  the  sec¬ 
ond  array  starts  its  output  at  time  4k.  The  first 
element  of  B  .  ,  which  is  the  (k+l)*c  element 

J 

of  the  main  diagonal  to  come  out  of  the  second 
array,  comes  out  at  time  6k.  By  this  time  the 
first  arrays  Inputs  have  become  Idle,  so  this 
element  can  immediately  reenter.  Therefore  one 
seep,  from  to  B^+^,  cakes  time  6k.  There  are 

Rn-ll/kf  such  steps,  hence  about  6n  time  for  the 
whole  process. 

2.3  Complex  matrices 

In  signal  processing  applications,  complex 
matrices  often  arise.  Here  we  discuss  the  algor- 
lthma  to  be  used  for  QR  iteration  with  complex 
matrices.  Essentially  we  show  that  the  plane  rota¬ 
tions  used  can  be  of  a  special  form: 

(i)  *;  - c**  + 

y'  •  -ox  +  cy 

where  x,  y,  and  c  are  complex  and  a  la  real.  This 
saves  1/4  of  the  multiplications  used  by  a  fully 
complex  plane  rotation  with  complex  s  Instead  of 
i  —  12  are  used  insrond  «f  16.  We  shall  call 
cnese  c,o  rotations. 

It  Is  possible  to  compute  the  SVC  of  a  complex 
m  x  n  matrix  A  •  Ar*LAj  using  real  arithmetic.  One 
finds  the  SVD  of  the  2m  x  2n  real  matrix 


Among  the  2n  singular  values  each  singular  value  of 
A  occurs  twice,  and  Che  singular  vectors  are  of  the 
form  (xj,  xj]  where  x  «  x^  +  lXj  Is  a  singular 
vector  of  A.  But  the  cost  is  much  greater.  In 
units  where  the  cost  of  doing  en  m  x  n  real  SVD  la 
one,  che  cost  for  the  real  2m  x  2n  SVC  Is  8  while 
the  complex  a  x  n  approach  costs  3  (not  4,  sines 
the  use  of  the  c,o  rotations  saves  1/4  of  the  work) 

We  now  show  chat  the  c.o  rotations  suffice. 

To  start,  we  note  that  the  banded  matrix  B  produced 
by  the  reduction  phase  can  always  be  chosen  to  have 
positive  real  elements  on  Its  main  and  k**1  super- 
diagonals.  Indeed  the  reduction  B  •  QAP  to  k+1 
diagonal,  upper  triangular  fora  Is  not  unique: 

*  -  QD1  (»jlA»‘l)  OjF 

is  also  such  a  reduction  for  any  unitary  diagonal 
matrices  D,  and  D-,  These  can  alwaye  be  chosen  to 
give  ■  the  stated  property .  In  fact,  the  trap- 
esoldal  array  can  do  this  automatically  [12],  When 
It  generates  a  rotation  to  sero  some  matrix  ele¬ 
ment,  the  second  element  of  the  pair  (x,y)  for 
lnacanca.  It  chooses  che  paramenters  so  that  the 
result  of  the  rotation  Is  the  pair 

«|x|2  ♦  |y|2)1/2  ,  0) 


Furthermore,  the  elements  Co  be  zeroed  are  the  real 
elements  resulting  from  previous  rotations.  The 
rotations  to  do  che  zeroing  can,  for  this  reason, 
be  caken  to  be  c,3  rotations. 

Now  we  look  at  the  second  phase.  Because  of 
the  structure  of  B,  the  main,  kth  super  and  kth  sub¬ 
diagonals  of  BTB  are  all  real.  The  rotations  that 
comprise  QQ  can  be  caken  to  be  c.o  rotations  since 
they  zero  real  elements.  And  by  keeping  crack  of 
che  locations  of  real  elements  one  can  show  that  In 
BQq  all  elements  of  the  outer  diagonals  are  real. 
Again  because  the  elements  to  be  annihilated  are 
real,  c.o  rotations  can  be  used  to  eliminate  the 
bulge.  A  matrix  with  the  same  structure  as  BO. 
results,  and  the  proof  therefore  follows  by  Induc¬ 
tion. 


2.4  An  alternate  scheme 


Cene  Golub  has  pointed  out  that  the  eigenvalues 
of  the  2n  x  2n  matrix 


are  the  singular  values  of  A  taken  with  positive  and 
negative  sign,  and  If  (xT,  yT)  Is  an  eigenvector  of 
C  then  x  Is  a  left  singular  vector  of  B  and  y  is  a 
right  singular  vector  of  B  [5].  Thus  we  may  attempt 
to  find  che  elgendecomposltlon  of  C.  After  a  sym¬ 
metric  Interchange  of  rows  end  columns  corresponding 
to  the  permutation  (n+1,  2,  n+2,  2,  ....  2n,  n), 

C  Is  a  synetric  4k-l  diagonal  matrix.  A  2k-l  x 
4k-l  HI  array  can  Implement  one  step  of  the  QR 
method  with  shifts  for  this  matrix  in  n  +  0(k) 
time  P 01 .  Tn  the  cmnnlex  case,  hnth  0  and  rh" 
permuted  C  have  real  outermost  diagonals,  so  c.o 
rotations  can  be  used.  Thus,  although  twice  es  much 
hardware  is  used,  the  time  per  iteration  is  1/6  as 
great  as  for  the  previous  scheme. 


3  VLSI  IMPLEMENTATION 

Now  we  consider  how  to  build  the  cells  of  the 
HI  array.  The  fundamental  unit  we  use  in  this  con¬ 
struction  is  a  multiply-add  cell,  whose  function  is 
this: 


w 

;0; 

wfxy 

Outputs  leave  the  cell  one  clock  after  inputs  enter. 

Although  other  primitive  units  (CORDIC  blocks, 
for  example)  might  be  used,  we  feel  that  the  mult- 
lply-add  Is  a  good  basis  for  such  an  investigation. 
Currently,  a  floating  point  multlply-add  is  about 
what  can  bs  integrated  on  a  single  chip.  It  is 
almost  universally  useful.  Indeed,  the  multlply- 
add  pair  Is  often  the  Inner  loop  In  numerical  linear 
algebraic  computations.  Even  when  larger  cells  and 
plecss  of  arrays  can  be  Integrated  Into  single 
chips,  designs  based  on  the  multiply-add  primitive 
will  be  useful. 

We  shall  discuss  lmplemantatlon  of  the  HI  array 
cells  for  complsx  data.  The  real  case  was  discussed 
earlier  [11]  as  were  the  cells  of  the  trapezoidal 
array  [12], 


The  complex  HI  array  triangularises  a  handed 
Input  matrix  using  c.J  rotations  of  the  form  (1). 
The  rotations  are  applied  to  a  pair  (x,y)  of  matrix 
elements  by  an  Internal  cell 

x' 


t 

y 


after  having  been  generated  by  a  boundary  cell 

n' 

t 

x 

c 
o 
t 

n 

by 


a  •  n  /  x' 


In  the  Internal  cell  computation,  4  quantities 
are  computed,  each  requiring  3  multiplies  and  2 
adds.  Let  tg  and  z.  denote  the  real  and  imaginary 
parts  of  the  complex  quantity  z.  The  computed 
values  are 

■  ^*11  +  ci*i  - 

-  "yTi 


A  second  primitive,  for  divide  and  square  root.  Is 
needed  to  Implement  the  boundary  c*  1.  We  assume 
that  a  chip  for  computing 

(a,b)  *--•>  a  /  b1^2 

la  available.  A  compund  cell  using  one  multiply- 
add  and  two  of  these  square  root  chips  can  produce 
results  at  the  rate  required  to  keep  up  with  the 
internal  cell.  A  schedule  is  shown  in  Table  2. 

The  overall  array  timing  Is  now  that  of  the  "Ideal" 
HI  array  in  which  everything  happens  In  a  single 
cycle  (of  length  3  chip  clocks)  .  The  cells  are 
used  1/2  of  the  time,  but  two  independent  problems 
can  be  solved  simultaneously,  making  full  use  of 
the  hardware. 


Table  2.  Schedule  for  HI  Boundary  Cell 


time! 

Chips 

I/O 

mult-add 

sqrt  fl 

sqrt  02 

i 

P^.Xg 

2 

■Htj  <-p’^) 

*T 

3 

V»,2r1/2 

?i.-2r1/2 

0 

4 

P’2’CR 

CI 

5 

p[p*2r1/2 

iP,2rI/2 

6 

v.  P* 
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Using  4  multlply-add  chips  we  can  construct  a 
compound  cell  that  gives  these  results  in  the  least 
possible  time,  3  clocks.  We  assume  that  complex 
quantities  are  represented  in  "word  serial”  form, 
with  the  real  part  preceding  the  imaginary  part  on 
tha  same  data  path.  A  schedule  using  4  chips  that 
achieves  the  minimum  latency  Is  shown  In  Table  1. 


Table  1.  Schedule  for  Internal  HI  Cell 


n'  «  /rtu 

CR  •  *R  1  n* 
CI  -  *1  '  n’ 


o  •  n  /  n’ 


REFERENCES 

111  H.  C.  Andrews  and  C.  L.  Patterson  :  "Singular 

Value  Decomposition  and  Digital  Image  Proc¬ 
essing",  IEEE  Trans.  Acoustics.  Speech,  and 
Signal  Processing  ASSP-24  (1976),  pp.  26-53. 

Ill  Richard  P.  Brent  and  Franklin  T.  Luk  :  "A  Sys¬ 
tolic  Architecture  for  the  Singular  Value 
Decomposition",  Report  TR-CS-82-09,  Depart¬ 
ment  of  Ccsputer  Science,  The  Australian 
National  University,  Canberra,  1982. 

Ill  Alan  M.  Finn,  Franklin  T.  Luk,  and  Christopher 
Pottle  :  "Systolic  Array  Computation  of  the 
Singular  Value  Decomposition",  Real  Time 
Signal  Processing  V.  SPIE  Vol.  341.  Belling¬ 
ham  Wash.,  Society  of  Photo-optical  Instru¬ 
mentation  Engineers,  1982. 

/4/  W.  M.  Gentleman  and  H.  T.  Kung  :  "Matrix  Trlan- 
gularlzation  by  Systolic  Array",  Real  Time 
Signal  Processing  IV.  SPIE  Vol.  298.  Bell¬ 
ingham,  Wash. ,  Society  of  Photo-optical 
Instrumentation  Engineers,  1981. 

Ill  Gene  Colub.  Private  cosssunlcatlon. 

16 1  G.  H.  Golub  and  F.  T.  Luk  :  "Singular  Value 
Decomposition:  Applications  and  Computa¬ 
tions”,  ARO  Report  77-1,  Trans,  of  the  22nd 
Conf.  of  Armv  Mathematicians  (1977), 
pp.  577-605. 

17/  C.  H.  Colub  and  C.  Relnsch  :  "Singular  Value 

Decomposition  and  Least  Squares  Solutions", 
Numer.  Math.  14,  (1970),  pp.  403-420. 


01 


Hi  N 


